Workflow

Single-cell transcriptomics of cells infected with influenza virions carrying barcodes. This experiment allows accurate detection of the number of unique virions infecting each cell and its resulting impact on the transcriptome. The single-cell transcriptomics were performed using 10X Chromium.

The basic steps in the analysis are as follows:

Detailed software versions can be found under Rules.

Results

File Size Description Job properties
fastq10x_qc_analysis.html 350.7 kB

Analysis of quality-control statistics from the generation of the 10X FASTQ files using cellranger mkfastq, in the form of an HTML rendering of a Jupyter notebook.

Rulefastq10x_qc_analysis
File Size Description Job properties
align_fastq10x_summary.html 427.3 kB

Statistics from the STARsolo alignments of the 10X Illumina FASTQ files in the form of an HTML rendering of a Jupyter notebook.

Rulealign_fastq10x_summary
fastq10x_transcript_coverage.html 526.4 kB

Coverage plots for some selected transcript in the aligned 10X Illumina reads, in the form of an HTML rendering of a Jupyter notebook.

Rulefastq10x_transcript_coverage
File Size Description Job properties
pilot_20200116_analyze_cell_gene_matrix.html 459.7 kB

Analysis of the cell-gene matrix for pilot_20200116 in the form of a HTML rendering of a Jupyter notebook.

Ruleanalyze_cell_gene_matrix
Wildcardssample10x=pilot_20200116
wt_virus_pilot_analyze_cell_gene_matrix.html 498.1 kB

Analysis of the cell-gene matrix for wt_virus_pilot in the form of a HTML rendering of a Jupyter notebook.

Ruleanalyze_cell_gene_matrix
Wildcardssample10x=wt_virus_pilot
File Size Description Job properties
count_viralbc_fastq10x-pilot_20200116.html 347.0 kB

Counting of viral barcodes for pilot_20200116, in the form of an HTML rendering of a Jupyter notebook .

Rulecount_viralbc_fastq10x
Wildcardssample10x=pilot_20200116
count_viralbc_fastq10x-wt_virus_pilot.html 344.0 kB

Counting of viral barcodes for wt_virus_pilot, in the form of an HTML rendering of a Jupyter notebook .

Rulecount_viralbc_fastq10x
Wildcardssample10x=wt_virus_pilot
count_viraltags_fastq10x-pilot_20200116.html 378.6 kB

Counting of the viral tags for pilot_20200116 in the form of an HTML rendering of a Jupyter notebook.

Rulecount_viraltags_fastq10x
Wildcardssample10x=pilot_20200116
count_viraltags_fastq10x-wt_virus_pilot.html 375.8 kB

Counting of the viral tags for wt_virus_pilot in the form of an HTML rendering of a Jupyter notebook.

Rulecount_viraltags_fastq10x
Wildcardssample10x=wt_virus_pilot
gap_analysis.html 949.9 kB

Analysis of reads with gaps in the aligned 10X Illumina FASTQ reads in the form of an HTML rendering of a Jupyter notebook.

Ruleanalyze_gaps
viral_fastq10x_coverage.html 706.0 kB

Analysis of the coverage of the viral genes (including viral tags and viral barcodes) in the aligned 10X Illumina FASTQ reads in the form of an HTML rendering of a Jupyter notebook.

Ruleviral_fastq10x_coverage

Statistics

If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue.

Configuration

File Code
config.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# YAML configuration file for the analysis

# max CPUs used by any rules
max_cpus: 16

# file specifying 10X Illumina runs
illumina_runs_10x: data/illumina_runs_10x.csv

# output directories
fastq10x_dir: results/fastq10x  # FASTQ files & QC stats for 10X Illumina runs
mkfastq10x_dir: results/fastq10x/mkfastq_output  # `cellranger mkfastq` output
genome_dir: results/genomes  # location of downloaded genomes and annotations
refgenome: results/genomes/refgenome  # STAR reference genome directory
aligned_fastq10x_dir: results/aligned_fastq10x  # aligned 10X Illumina reads
viral_fastq10x_dir: results/viral_fastq10x  # viral tags / barcodes in 10X reads
analysis_dir: results/analysis  # fine-grained analyses

# cellular genome and GTF ftp sites
cell_genome_ftp: ftp://ftp.ensembl.org/pub/release-98/fasta/canis_familiaris/dna/Canis_familiaris.CanFam3.1.dna.toplevel.fa.gz
cell_gtf_ftp: ftp://ftp.ensembl.org/pub/release-98/gtf/canis_familiaris/Canis_familiaris.CanFam3.1.98.gtf.gz

# viral genome (FASTA), GTF, and Genbank file locations
viral_genome: data/flu_sequences/flu-CA09.fasta
viral_gtf: data/flu_sequences/flu-CA09.gtf
viral_genbank: data/flu_sequences/flu-CA09.gb

# file giving nucleotide identities at viral tag sites
viraltag_identities: data/flu_sequences/flu-CA09_viral_tags.yaml

# STAR alignment parameters. These settings reduce the penalty for
# non-canonical splice sites, which is probably bad for mapping cellular
# reads but is good for mapping viral reads which will have deletions
# not corresponding to splice sites.
barcode_total_length: 28 #length of UMI + CB

scoreGapNoncan: -4
scoreGapGCAG: -4
scoreGapATAC: -4

# URL location of 10X barcode whitelist: **this is for the v3 kit**
cb_whitelist_10x_url: https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
cb_whitelist_10x: results/aligned_fastq10x/cb_whitelist_10x.txt

cb_len_10x: 16  # length of 10X cell barcode
umi_len_10x: 12  # length of 10X UMI: **this is for the v3 kit**

expect_ncells: 6000  # expected cells per 10X run, for "knee" cell calling

Rules

Rule Jobs Output Singularity Conda environment Code
fastq10x_qc_analysis 1
  • results/fastq10x/fastq10x_qc_analysis.ipynb
  • results/fastq10x/fastq10x_qc_analysis.html
source
align_fastq10x_summary 1
  • results/aligned_fastq10x/align_fastq10x_summary.ipynb
  • results/aligned_fastq10x/align_fastq10x_summary.html
source
fastq10x_transcript_coverage 1
  • results/aligned_fastq10x/fastq10x_transcript_coverage.ipynb
  • results/aligned_fastq10x/fastq10x_transcript_coverage.html
source
viral_fastq10x_coverage 1
  • results/viral_fastq10x/viraltag_locs.csv
  • results/viral_fastq10x/viralbc_locs.csv
  • results/viral_fastq10x/viral_fastq10x_coverage.ipynb
  • results/viral_fastq10x/viral_fastq10x_coverage.html
source
analyze_gaps 1
  • results/viral_fastq10x/gap_analysis.ipynb
  • results/viral_fastq10x/gap_analysis.html
source
count_viraltags_fastq10x 2
  • results/viral_fastq10x/count_viraltags_fastq10x-wt_virus_pilot.ipynb
  • results/viral_fastq10x/count_viraltags_fastq10x-wt_virus_pilot.html
  • results/viral_fastq10x/viraltag_counts_wt_virus_pilot.csv
  • results/viral_fastq10x/count_viraltags_fastq10x-pilot_20200116.ipynb
  • results/viral_fastq10x/count_viraltags_fastq10x-pilot_20200116.html
  • results/viral_fastq10x/viraltag_counts_pilot_20200116.csv
source
count_viralbc_fastq10x 2
  • results/viral_fastq10x/count_viralbc_fastq10x-wt_virus_pilot.ipynb
  • results/viral_fastq10x/count_viralbc_fastq10x-wt_virus_pilot.html
  • results/viral_fastq10x/viralbc_counts_wt_virus_pilot.csv
  • results/viral_fastq10x/count_viralbc_fastq10x-pilot_20200116.ipynb
  • results/viral_fastq10x/count_viralbc_fastq10x-pilot_20200116.html
  • results/viral_fastq10x/viralbc_counts_pilot_20200116.csv
source
analyze_cell_gene_matrix 2
  • results/analysis/wt_virus_pilot_analyze_cell_gene_matrix.ipynb
  • results/analysis/wt_virus_pilot_analyze_cell_gene_matrix.html
  • results/analysis/pilot_20200116_analyze_cell_gene_matrix.ipynb
  • results/analysis/pilot_20200116_analyze_cell_gene_matrix.html
source
make_fastq10x 3
  • results/fastq10x/wt_virus_pilot-2019-12-03_all_R1.fastq.gz
  • results/fastq10x/wt_virus_pilot-2019-12-03_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/wt_virus_pilot-2019-12-03
  • results/fastq10x/wt_virus_pilot-2019-12-03_qc_stats.csv
  • _mkfastq_wt_virus_pilot-2019-12-03.csv
  • __wt_virus_pilot-2019-12-03.mro
  • results/fastq10x/pilot_20200116-2020-01-16_all_R1.fastq.gz
  • results/fastq10x/pilot_20200116-2020-01-16_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/pilot_20200116-2020-01-16
  • results/fastq10x/pilot_20200116-2020-01-16_qc_stats.csv
  • _mkfastq_pilot_20200116-2020-01-16.csv
  • __pilot_20200116-2020-01-16.mro
  • results/fastq10x/pilot_20200116-2020-02-18_all_R1.fastq.gz
  • results/fastq10x/pilot_20200116-2020-02-18_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/pilot_20200116-2020-02-18
  • results/fastq10x/pilot_20200116-2020-02-18_qc_stats.csv
  • _mkfastq_pilot_20200116-2020-02-18.csv
  • __pilot_20200116-2020-02-18.mro
source
align_fastq10x 2
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/Summary.csv
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/UMIperCellSorted.txt
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/filtered/matrix.mtx
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/filtered/features.tsv
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/filtered/barcodes.tsv
  • results/aligned_fastq10x/wt_virus_pilot/Aligned.sortedByCoord.out.bam
  • results/aligned_fastq10x/pilot_20200116/Solo.out/Gene/Summary.csv
  • results/aligned_fastq10x/pilot_20200116/Solo.out/Gene/UMIperCellSorted.txt
  • results/aligned_fastq10x/pilot_20200116/Solo.out/Gene/filtered/matrix.mtx
  • results/aligned_fastq10x/pilot_20200116/Solo.out/Gene/filtered/features.tsv
  • results/aligned_fastq10x/pilot_20200116/Solo.out/Gene/filtered/barcodes.tsv
  • results/aligned_fastq10x/pilot_20200116/Aligned.sortedByCoord.out.bam
source
index_bam 2
  • results/aligned_fastq10x/wt_virus_pilot/Aligned.sortedByCoord.out.bam.bai
  • results/aligned_fastq10x/pilot_20200116/Aligned.sortedByCoord.out.bam.bai
1
samtools index {input} {output}
make_refgenome 1
  • results/genomes/cell_and_virus_gtf.gtf
  • results/genomes/refgenome
1
2
3
4
        cat {input.cell_gtf} {input.viral_gtf} > {output.concat_gtf}
        mkdir -p {output.genomeDir}
        STAR --runThreadN {threads}              --runMode genomeGenerate              --genomeDir {output.genomeDir}              --genomeFastaFiles {input.cell_genome} {input.viral_genome}              --sjdbGTFfile {output.concat_gtf}
        
get_cb_whitelist_10x 1
  • results/aligned_fastq10x/cb_whitelist_10x.txt
1
2
3
4
5
6
7
        if [[ {params.url} == *.gz ]]
        then
            wget -O - {params.url} | gunzip -c > {output}
        else
            wget -O - {params.url} > {output}
        fi
        
get_cell_genome 1
  • results/genomes/cell_genome.fasta
1
wget -O - {params.ftp} | gunzip -c > {output}
get_cell_gtf 1
  • results/genomes/cell_gtf.gtf
1
wget -O - {params.ftp} | gunzip -c > {output}